Background: Knowing the three-dimensional (3D) structure of the chromatin is important for obtaining a complete\npicture of the regulatory landscape. Changes in the 3D structure have been implicated in diseases. While there exist\napproaches that attempt to predict the long-range chromatin interactions, they focus only on interactions between\nspecific genomic regionsââ?¬â?the promoters and enhancers, neglecting other possibilities, for instance, the so-called\nstructural interactions involving intervening chromatin.\nResults: We present a method that can be trained on 5C data using the genetic sequence of the candidate loci to\npredict potential genome-wide interaction partners of a particular locus of interest. We have built locus-specific\nsupport vector machine (SVM)-based predictors using the oligomer distance histograms (ODH) representation. The\nmethod shows good performance with a mean test AUC (area under the receiver operating characteristic (ROC)\ncurve) of 0.7 or higher for various regions across cell lines GM12878, K562 and HeLa-S3. In cases where any locus did\nnot have sufficient candidate interaction partners for model training, we employed multitask learning to share\nknowledge between models of different loci. In this scenario, across the three cell lines, the method attained an\naverage performance increase of 0.09 in the AUC. Performance evaluation of the models trained on 5C data regarding\nprediction on an independent high-resolution Hi-C dataset (which is a rather hard problem) shows 0.56 AUC, on\naverage. Additionally, we have developed new, intuitive visualization methods that enable interpretation of sequence\nsignals that contributed towards prediction of locus-specific interaction partners. The analysis of these sequence\nsignals suggests a potential general role of short tandem repeat sequences in genome organization.\nConclusions: We demonstrated how our approach can 1) provide insights into sequence features of locus-specific\ninteraction partners, and 2) also identify their cell-line specificity. That our models deem short tandem repeat\nsequences as discriminative for prediction of potential interaction partners, suggests that they could play a larger role\nin genome organization. Thus, our approach can (a) be beneficial to broadly understand, at the sequence-level,\nchromatin interactions and higher-order structures like (meta-) topologically associating domains (TADs); (b) study\nregions omitted from existing prediction approaches using various information sources (e.g., epigenetic information);\nand (c) improve methods that predict the 3D structure of the chromatin.
Loading....